AITopics | variational structured semantic inference

Variational Structured Semantic Inference for Diverse Image Captioning

Neural Information Processing SystemsDec-25-2025, 19:03:38 GMT

Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior. Then, a reconstruction loss and the posterior-prior KL-divergence are jointly estimated to optimize the VSSI-cap model. Finally, diverse captions are generated upon the visual features and the latent variables from this structured encoder-inferer-decoder model. Experiments on the benchmark dataset show that the proposed VSSI-cap achieves significant improvements over the state-of-the-arts.

diverse image captioning, name change, variational structured semantic inference, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.86)
Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Reviews: Variational Structured Semantic Inference for Diverse Image Captioning

Neural Information Processing SystemsJan-26-2025, 00:46:01 GMT

Originality: The proposed approach using syntactic and lexical diversity modelling within the latent space to generate diverse image captions is novel. Quality: To establish that the generated captions are diverse, various standard diversity metrics are measured for the proposed method in Tab. 2. Some qualitative results demonstrating diverse captions and diversity conditioned on different visual parse tree probabilities is shown in Figure 1 and 6. These experiments help justify the core components of the proposed approach. Clarity: The paper is well written and easy to follow. Careful illustrations in Figure 1 and 3 are used as an aid while describing the proposed method.

diverse image captioning, experiment, variational structured semantic inference, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Reviews: Variational Structured Semantic Inference for Diverse Image Captioning

Neural Information Processing SystemsJan-26-2025, 00:30:02 GMT

After considering the author response and discussing the submission, the reviewers all voted to accept the submission. The approach presented puts forward a novel framing for caption diversity and the empirical evaluation supports the paper's contributions. The document as a whole could use additional clarity so I urge authors to spend time revising it to broaden the impact of this work.

diverse image captioning, submission, variational structured semantic inference

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.40)

Add feedback

Variational Structured Semantic Inference for Diverse Image Captioning

Neural Information Processing SystemsOct-10-2024, 14:37:39 GMT

Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior.

diverse image captioning, syntactic diversity, variational structured semantic inference, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.81)

Add feedback

Variational Structured Semantic Inference for Diverse Image Captioning

Chen, Fuhai, Ji, Rongrong, Ji, Jiayi, Sun, Xiaoshuai, Zhang, Baochang, Ge, Xuri, Wu, Yongjian, Huang, Feiyue, Wang, Yan

Neural Information Processing SystemsMar-18-2020, 21:04:05 GMT

Despite the exciting progress in image captioning, generating diverse captions for a given image remains as an open problem. Existing methods typically apply generative models such as Variational Auto-Encoder to diversify the captions, which however neglect two key factors of diverse expression, i.e., the lexical diversity and the syntactic diversity. To model these two inherent diversities in image captioning, we propose a Variational Structured Semantic Inferring model (termed VSSI-cap) executed in a novel structured encoder-inferer-decoder schema. VSSI-cap mainly innovates in a novel structure, i.e., Variational Multi-modal Inferring tree (termed VarMI-tree). In particular, conditioned on the visual-textual features from the encoder, the VarMI-tree models the lexical and syntactic diversities by inferring their latent variables (with variations) in an approximate posterior inference guided by a visual semantic prior.

artificial intelligence, machine learning, variational structured semantic inference, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.86)

Add feedback

Filters

Collaborating Authors

variational structured semantic inference

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Variational Structured Semantic Inference for Diverse Image Captioning

Reviews: Variational Structured Semantic Inference for Diverse Image Captioning

Reviews: Variational Structured Semantic Inference for Diverse Image Captioning

Variational Structured Semantic Inference for Diverse Image Captioning

Variational Structured Semantic Inference for Diverse Image Captioning